A Model-Based Experiment towards an Emotional Syn- thesis

نویسنده

  • Jonas Lindh
چکیده

The most successful methods to induce emotions on state of the art unit selection speech synthesis have been built by switching speech database depending on the desired emotion. These methods require a substantial increase of memory compared to a single database and are computationally slow. The model-based approach is an attempt to reshape a neutrally recorded utterance (comparable to the desired output from a modern unit selection system) into simulating a recorded model of a desired emotion. Factors for manipulation of duration, amplitude and formant shift ratio are calculated by comparing the recorded neutral utterance with three recorded, basic emotional models in accordance with discrete emotion theory – sadness, happiness and anger. F0 (regarded as the intonation) is copied from the model and is then imposed on the neutrally recorded utterance. The evaluation of the experiment shows that subjects easily categorize discrete emotions in a forced choice. They also grade the resynthesized emotional quality from the neutrally recorded utterance almost equally high as the naturally recorded models for the male voice. The female voice created more difficulties and contained more synthetic artifacts, i.e. it was judged to have a lower quality than the recorded models. Background and Introduction Creating emotional synthesis has been a research area for quite some time. Formant speech synthesis is easily distinguished from human speech not only because of the underdeveloped naturalness, but also due to the lack of expressiveness. Several attempts to implement emotions in formant synthesis have taken place (Cahn, 1988; 1989; 1990; Carlson et al., 1992). When dealing with emotional content in speech the point of departure is almost always the neutral utterance. What is neutral speech, i.e. speech without emotions? Normally, neutral speech is thought of as a carrier being modulated to reveal the emotions being communicated. Such a concept is rather useful when it comes to synthesizing expressive speech. One simply treats the relationship in a hierarchy where the abstract underlying expression is neutral and the surface expressions are the emotions we want to induce, in this case the basic emotions from discrete emotional theory anger, sadness and happiness (Levenson, 1994; Laukka, 2004; Tatham & Morton, 2004; Narayanan & Alwan, 2004). A modern state of the art unit selection speech synthesis normally produces a sentence as neutrally as possiblein order to avoid undesired side effects or miscommunication. Neutral in this case means near monotone or containing as few speech fluctuations as possible. This is not always desirable when it comes to for example dialogue systems. To be able to compare whether a system succeeds in expressing a certain emotion or desire, it is obviously also important to study how well people in general succeed in communicating emotions. The development of conversational systems has increased, meaning that understandable, neutral synthetic speech is barely acceptable anymore. Some success has been reached, but the best ones still depend too much on stored data, including a separate emotional speech database. (Bulut et al., 2002) The most successful attempts to synthesize emotions have been built by using additional speech databases containing only recordings representing specific emotions uttered (this applies to concatenative/unit selection synthesis systems). The system has to be able to switch database when a specific emotion is desirable. The system must perhaps also use different algorithms/analyses for the different databases since the acoustic content might differ significantly. The databases needed for such a system also mean a substantial increase of data to choose from. A simpler and computationally more efficient method is to induce rules for expressive speech and resynthesize an utterance produced by the system. Nowadays, most unit selection systems are created by recording a single professional speaker and then using specified parts (nor-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

N-Phenyl-2-p-tolylthiazole-4-carboxamide derivatives: Syn-thesis and cytotoxicity evaluation as anticancer agents

Objective(s):According to the prevalence of neoplastic diseases, there is a deep necessity for discovery of novel anticancer drugs in the field of medicinal chemistry. In the current study, a new series ofphenylthiazole derivatives(compounds 4a-4f) was synthesizedand theiranticancer activity was assessed in vitro. Materials and Methods:All synthesized derivatives were evaluated towards three h...

متن کامل

Thesis Submitted in Partial Fulfillment of the requirement for the Degree of M.A/M. Sc In School consultant

Goal: The aim of this study is assess and compare emotional ability of deaf. Semi _ deaf and hearing students (14 _ 20) in Mashhad. Method: To do this experiment out of studies evidence   generally 105 students selecting randomly. From each group, choose the number of normal boys and girls 35, deaf boys and girls and semi deaf boys and girls .this article is useful and explanatory .in this stud...

متن کامل

NATIONAL UNIVERSITY OF SINGAPORE School of Computing PH.D DEFENCE - PUBLIC SEMINAR

Affective computing is currently an active area of research, which is attracting an increasing amount of attention. With the diffusion of ?affective computing? in many application areas, ?affective video content analysis? is being extensively employed to help computers discern the affect contained in videos. However, the relationship between the syntactic content of the video, which is captured...

متن کامل

Comparing the Effect of Two Methods of Presenting Physical Education Π Course on the Attitudes and Practices of Female Students towards Regular Physical Activity in Isfahan University of Medical Sciences

Introduction: Regular physical activity has a positive effect on physical, mental, and social health aspects of students and society and presenting physical education course in universities plays an important role in achieving this goal. This study was performed with the aim to compare the effectiveness of two methods (Basnef and routine) of presenting physical education course on the attitude ...

متن کامل

Using Mixed Technique of Kansei Engineering, Kano Model, and Taguchi-based Experiment Design to Improve Satisfaction and Participation of Football Spectators at Stadiums

In football industry, the sports managers should include meeting of demands and needs of spectators on their agenda. However, this study aimed to determine the needs of spectators in Iranian Football Premier League. For this purpose, Kansei engineering method, Kano model, and Taguchi method were used to collect data. First, the emotional needs (41 needs) of spectators were identified. Using Kan...

متن کامل

Network-based Intrusion Detection Model for Detecting TCP SYN flooding

This paper presents a method for detecting TCP SYN flooding attack using BENEF model. Our model relies on the significant parameters of anomalous network packets, the statistic of system behavior, and the decision with threshold and fuzzy rule-based technique. With fuzzy technique, rules or a set of rules corresponding with the appropriate membership value are designed for analysis and to find ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005